1 Introduction
1.1 Background
Truly understanding customers is key to a successful business. Companies increasingly leverage data analytics to uncover insights into their customers’ needs, preferences, and socio-demographic profiles. This information is essential for enhancing user experiences, differentiating from competitors, and creating products that satisfy market demands. Moreover, understanding customers improves strategic planning by facilitating informed decisions about market positioning, pricing, and promotional strategies. By aligning marketing strategies with consumer needs, businesses can achieve more successful and impactful outcomes.
Traditional approaches to understanding customers, such as surveys, focus groups, and market research reports, have long been relied upon. While these methods can provide useful insights, they have significant limitations. Surveys, for instance, often suffer from low response rates and biased results. Focus groups may not accurately represent the entire customer base, leading to incomplete or skewed findings. Market research reports, while detailed, are frequently expensive and time-consuming to produce. Additionally, these traditional methods typically provide only a static view of customer preferences, failing to capture the dynamic changes in consumer behavior over time.
To address these limitations, businesses are increasingly turning to more innovative approaches. One such approach is the analysis of customer-generated content, such as product reviews. Product reviews offer valuable insights into customer preferences and sentiments. Various techniques, including sentiment analysis, topic modeling, and natural language processing, can be employed to extract meaningful information from this user-generated content.
1.2 Research Objectives
Traditionally, product reviews have been analyzed primarily on feedback about the product, such as customer opinions, satisfaction levels, and sentiments on specific features. However, insights into customers’ personal attributes, behaviors, and usage contexts have not been thoroughly explored. Addressing this gap presents a valuable opportunity to leverage user-generated content for a deeper understanding of customers.
The extraction of personal attributes from product reviews is the focus of this research. Personal attributes are specific characteristics that uniquely describe an individual’s traits, lifestyle, and circumstances, such as physical appearance, profession, household composition, and living situation. These attributes could include details such as hair type, skin condition, profession, number of family members or pets, and whether they live in an apartment, townhouse or single-family home. By extracting personal attributes from product reviews, we aim to gain a deeper understanding of the characteristics of customers.
1.4 Approach
In line with previous research on extracting personal attributes, this study will also use the triplet format to store these attributes. However, our approach diverges by addressing some of the key limitations of earlier studies, including that previous research often relied on predefined lists of attributes. Inspired by recent advancements in pre-trained large language models (LLMs), this work introduces a novel approach for extracting personal attributes using these models. Our method aims to extract triplets from product reviews without being limited by predefined attribute categories, thus offering a more comprehensive and adaptable solution to capture the various personal attributes revealed in customer feedback.
We present an end-to-end pipeline, starting from raw text all the way through to a constructed knowledge graph. The approach , text pre-processing, triplet extraction post-processing.
1.5 Main Contributions
The main contributions of this thesis are as follows: * Pipeline: We developed an end-to-end pipeline that leverages large language models (LLMs) to automate the construction of Knowledge Graphs from unstructured data. * Comparison: We provided a comparison between our LLM-based approach and traditional triplet extraction methods. * Case Study Implementation: We conducted a detailed case study to evaluate the performance of our framework. * Future Direction: We identified potential areas for future research and development, offering insights into how the end-to-end pipeline can be further improved and extended.
1.6 Thesis Outline
Placeholder for thesis outline